Due on Monday, November 2nd by 5:00 pm (17:00), UK local time.
Before you start be sure to read through all of the rules and instructions in the README.md.
library(tidyverse)
library(kableExtra)
library(gganimate)
library(gifski)
f1 = readRDS(file="data/f1.rds")
The structure of the f1 data is a hierarchical structure with the first(top) list being all of the MRData. Then as we go further down the structure we reach the elements which make up the MRData, and further again we reach the sub-elements of those elements. Our aim is to get race_name, round,date, driver, constructor,position,points as our columns. So in each extraction/pull we seek to get the columns. Using hoist we extract each level our data, starting with level 1 we pull MRData, and from MRData we pull total and Racetable at level 2. At level 3, we use hoist again to pull Races and season. Since Races is listed, we use unnest_longer to turn each element of list column into row. Followed by that, we use hoist again to pull round, raceName, circuit, date and results from Races. We move to level 5.1 to pull CircuitName from Circuit. Following to that,we need to make Result in lists, unnested to each each. We move to level 5.2 to extract positionText, driver, constructor and points from Results. Further more, we move to two of our final level 6.2.1 to extract givenName, familyName and nationality of the driver. Finally, we move to level 6.2.2 to get our final extraction of name(of constructor) from Constructor.
Since, we need to render a very specific dataframe, with names listed as driver, constructor, race_name and position. We change the names of the data frame by making the same dataframe using a different name.
Next, we need to change class of certain columns such as round, position, points into integer, and Date into a date class. In order to not induce the warnings brought out by the coercion, we convert the non numeric values of position to NA before we convert its class to an integer.
Finally, we select columns we week seek : race_name, round, date, driver, constructor, position, points, and use slice to select only 10 rows.
f1_tidy = tibble(all_data = f1) %>%
hoist(all_data, #level 1 pulling of MRData
MRData = "MRData",
)%>%
hoist(MRData, #level 2 pulling of total and RaceTable from MRData
total = "total",
RaceTable ="RaceTable")%>%
hoist(RaceTable, #level 3 extraction of season, Races from RaceTable
season = "season",
Races = "Races") %>%
tidyr::unnest_longer(Races)%>% #unnest each race to f1 level
hoist(Races,
round = "round", #level 4 extraction of round, raceName, ciruit, date and results from Races
raceName = "raceName",
Circuit = "Circuit",
date = "date",
Results = "Results" )%>%
hoist(Circuit,
circuitName = "circuitName")%>% #level 5.1 extraction of circuit name from circuit
tidyr::unnest_longer(Results)%>% #unnest each result to f1 level
hoist(Results, #level 5.2 extraction of positiontext, driver, constructor, and points from Results
positionText = "positionText",
Driver = "Driver",
Constructor = "Constructor",
points = "points") %>%
hoist(Driver, #level 6.2.1 extraction of givenName, familyName and nationality
givenName = "givenName",
familyName = "familyName",
nationality = "nationality")%>%
hoist(Constructor, #level 6.2.2 extraction of name(of constructor) from constructor
name = "name") %>%
mutate(points = as.numeric(points)) #convert character points to numeric points
#Modifications to render the desired data frame
f1_tidy$driver <- paste(f1_tidy$givenName, f1_tidy$familyName, sep = " ")
f1_tidy$constructor <- f1_tidy$name
f1_tidy$race_name <- f1_tidy$raceName
f1_tidy$position <- f1_tidy$positionText
#Convert certain position values into NA so coercion does not induce an error when we convert to integer
f1_tidy <- f1_tidy %>% mutate(position = replace(position, position %in% c('R', 'D'), NA))
#Convert round, position, points to integer from character and date to date class.
f1_tidy %>% select(race_name, round, date, driver, constructor, position, points) %>% mutate(round = as.integer(round),position = as.integer(position),points = as.integer(points),date = as.Date(date))%>%slice_head(n=10)
## # A tibble: 10 x 7
## race_name round date driver constructor position points
## <chr> <int> <date> <chr> <chr> <int> <int>
## 1 Australian Grand … 1 2019-03-17 Valtteri Bot… Mercedes 1 26
## 2 Australian Grand … 1 2019-03-17 Lewis Hamilt… Mercedes 2 18
## 3 Australian Grand … 1 2019-03-17 Max Verstapp… Red Bull 3 15
## 4 Australian Grand … 1 2019-03-17 Sebastian Ve… Ferrari 4 12
## 5 Australian Grand … 1 2019-03-17 Charles Lecl… Ferrari 5 10
## 6 Australian Grand … 1 2019-03-17 Kevin Magnus… Haas F1 Te… 6 8
## 7 Australian Grand … 1 2019-03-17 Nico Hülkenb… Renault 7 6
## 8 Australian Grand … 1 2019-03-17 Kimi Räikkön… Alfa Romeo 8 4
## 9 Australian Grand … 1 2019-03-17 Lance Stroll Racing Poi… 9 2
## 10 Australian Grand … 1 2019-03-17 Daniil Kvyat Toro Rosso 10 1
For task 2, we need to create a table with the finish position of each race and over total points for 20 drivers. Our aim is to create a table, so we start by creating a data frame. We need exactly 3 columnns: driver name, race name(with every race name showing the position of the driver) and sum of all points earned by the driver in total. We start by converting the position certain non integer values in position to NA and then using coercison to convert position into integer. Since we need the race names to be in a chronological order, we arrange by driver and date. Following this we use summarize to sum up points as and arrange in the decreasing order of points. Then we use pivot_wider to get the finish position. Finally to print out a nicely formatted version of the completed table with driver names, race names and total points as are columns, we use kable function from the kableExtra package.
#Convert the certain position values: R and D to NA to avoid warning induced due to coercsion
f1_tidy <- f1_tidy %>% mutate(position = replace(position, position %in% c('R', 'D'), NA))
f1_tidy %>%mutate(position = as.integer(position))%>%
arrange(driver, date) %>% #arrange using driver and dates since we are interested in a chronological results
group_by(driver) %>%
summarise(points = sum(points), #get sum of points, position and race_name in list format
position = list(position),
race_name = list(race_name),.groups ="drop") %>%
unnest(c(position, race_name)) %>% #unnest position and race_name
arrange(desc(points)) %>% # arrange total points in descending order
pivot_wider(names_from = race_name, values_from = position) %>% #make race_names columns and each position value as rows
select(driver, ends_with('Prix'), points) -> q2 #select the driver column and other race_names,total points
kable(q2)
| driver | Australian Grand Prix | Bahrain Grand Prix | Chinese Grand Prix | Azerbaijan Grand Prix | Spanish Grand Prix | Monaco Grand Prix | Canadian Grand Prix | French Grand Prix | Austrian Grand Prix | British Grand Prix | German Grand Prix | Hungarian Grand Prix | Belgian Grand Prix | Italian Grand Prix | Singapore Grand Prix | Russian Grand Prix | Japanese Grand Prix | Mexican Grand Prix | United States Grand Prix | Brazilian Grand Prix | Abu Dhabi Grand Prix | points |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Lewis Hamilton | 2 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 5 | 1 | 9 | 1 | 2 | 3 | 4 | 1 | 3 | 1 | 2 | 7 | 1 | 413 |
| Valtteri Bottas | 1 | 2 | 2 | 1 | 2 | 3 | 4 | 2 | 3 | 2 | NA | 8 | 3 | 2 | 5 | 2 | 1 | 3 | 1 | NA | 4 | 326 |
| Max Verstappen | 3 | 4 | 4 | 4 | 3 | 4 | 5 | 4 | 1 | 5 | 1 | 2 | NA | 8 | 3 | 4 | NA | 6 | 3 | 1 | 2 | 278 |
| Charles Leclerc | 5 | 3 | 5 | 5 | 5 | NA | 3 | 3 | 2 | 3 | NA | 4 | 1 | 1 | 2 | 3 | 6 | 4 | 4 | 18 | 3 | 264 |
| Sebastian Vettel | 4 | 5 | 3 | 3 | 4 | 2 | 2 | 5 | 4 | 16 | 2 | 3 | 4 | 13 | 1 | NA | 2 | 2 | NA | 17 | 5 | 240 |
| Carlos Sainz | NA | NA | 14 | 7 | 8 | 6 | 11 | 6 | 8 | 6 | 5 | 5 | NA | NA | 12 | 6 | 5 | 13 | 8 | 3 | 10 | 96 |
| Pierre Gasly | 11 | 8 | 6 | NA | 6 | 5 | 8 | 10 | 7 | 4 | 14 | 6 | 9 | 11 | 8 | 14 | 7 | 9 | 16 | 2 | 18 | 95 |
| Alexander Albon | 14 | 9 | 10 | 11 | 11 | 8 | NA | 15 | 15 | 12 | 6 | 10 | 5 | 6 | 6 | 5 | 4 | 5 | 5 | 14 | 6 | 92 |
| Daniel Ricciardo | NA | NA | 7 | NA | 12 | 9 | 6 | 11 | 12 | 7 | NA | 14 | 14 | 4 | 14 | NA | NA | 8 | 6 | 6 | 11 | 54 |
| Sergio Pérez | 13 | 10 | 8 | 6 | 15 | 13 | 12 | 12 | 11 | 17 | NA | 11 | 6 | 7 | NA | 7 | 8 | 7 | 10 | 9 | 7 | 52 |
| Lando Norris | 12 | 6 | 18 | 8 | NA | 11 | NA | 9 | 6 | 11 | NA | 9 | 11 | 10 | 7 | 8 | 11 | NA | 7 | 8 | 8 | 49 |
| Kimi Räikkönen | 8 | 7 | 9 | 10 | 14 | 17 | 15 | 7 | 9 | 8 | 12 | 7 | 16 | 15 | NA | 13 | 12 | NA | 11 | 4 | 13 | 43 |
| Daniil Kvyat | 10 | 12 | NA | NA | 9 | 7 | 10 | 14 | 17 | 9 | 3 | 15 | 7 | NA | 15 | 12 | 10 | 11 | 12 | 10 | 9 | 37 |
| Nico Hülkenberg | 7 | NA | NA | 14 | 13 | 14 | 7 | 8 | 13 | 10 | NA | 12 | 8 | 5 | 9 | 10 | NA | 10 | 9 | 15 | 12 | 37 |
| Lance Stroll | 9 | 14 | 12 | 9 | NA | 16 | 9 | 13 | 14 | 13 | 4 | 17 | 10 | 12 | 13 | 11 | 9 | 12 | 13 | 19 | NA | 21 |
| Kevin Magnussen | 6 | 13 | 13 | 13 | 7 | 12 | 17 | 17 | 19 | NA | 8 | 13 | 12 | NA | 17 | 9 | 15 | 15 | 18 | 11 | 14 | 20 |
| Antonio Giovinazzi | 15 | 11 | 15 | 12 | 16 | 19 | 13 | 16 | 10 | NA | 13 | 18 | 18 | 9 | 10 | 15 | 14 | 14 | 14 | 5 | 16 | 14 |
| Romain Grosjean | NA | NA | 11 | NA | 10 | 10 | 14 | 20 | 16 | NA | 7 | NA | 13 | 16 | 11 | NA | 13 | 17 | 15 | 13 | 15 | 8 |
| Robert Kubica | 17 | 16 | 17 | 16 | 18 | 18 | 18 | 18 | 20 | 15 | 10 | 19 | 17 | 17 | 16 | NA | 16 | 18 | NA | 16 | 19 | 1 |
| George Russell | 16 | 15 | 16 | 15 | 17 | 15 | 16 | 19 | 18 | 14 | 11 | 16 | 15 | 14 | NA | NA | 16 | 16 | 17 | 12 | 17 | 0 |
For task 3, we need to provide a table with cumulative points for each game,for each constructor. Our objective is to have a table with 22 columns, including constructor name, names of races. We first mutate, race name to factors and group the data frame on basis of constructor name and race name. We summarise to create a new variable to find total(sum) of points. We use mutate to find the cumulative sum of points. Finally, we use pivot_wider to get the cumulative points for each race. Finally to print out a nicely formatted version of the completed table with constructor names, race names(with cumulative points for each driver according to the race), we use kable function from the kableExtra package.
q3 <- f1_tidy %>%
mutate(raceName = factor(raceName, unique(raceName)))%>% # convert the raceName to factor, we only want unique raceNames
group_by(name, raceName) %>% # group the df on basis of constructor name and raceName
summarise(points = sum(points),.groups = "drop_last") %>% #summarise by finding total points, and drop the last level of grouping
mutate(points = cumsum(points)) %>% #create a new variable, points which calculates cumulative sums
pivot_wider(names_from = raceName, values_from = points) #make a columns using raceNames and take values from points
kable(q3)
| name | Australian Grand Prix | Bahrain Grand Prix | Chinese Grand Prix | Azerbaijan Grand Prix | Spanish Grand Prix | Monaco Grand Prix | Canadian Grand Prix | French Grand Prix | Austrian Grand Prix | British Grand Prix | German Grand Prix | Hungarian Grand Prix | Belgian Grand Prix | Italian Grand Prix | Singapore Grand Prix | Russian Grand Prix | Japanese Grand Prix | Mexican Grand Prix | United States Grand Prix | Brazilian Grand Prix | Abu Dhabi Grand Prix |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Alfa Romeo | 4 | 10 | 12 | 13 | 13 | 13 | 13 | 19 | 22 | 26 | 26 | 32 | 32 | 34 | 35 | 35 | 35 | 35 | 35 | 57 | 57 |
| Ferrari | 22 | 48 | 73 | 99 | 121 | 139 | 172 | 198 | 228 | 243 | 261 | 288 | 326 | 351 | 394 | 409 | 435 | 466 | 479 | 479 | 504 |
| Haas F1 Team | 8 | 8 | 8 | 8 | 15 | 16 | 16 | 16 | 16 | 16 | 26 | 26 | 26 | 26 | 26 | 28 | 28 | 28 | 28 | 28 | 28 |
| McLaren | 0 | 8 | 8 | 18 | 22 | 30 | 30 | 40 | 52 | 60 | 70 | 82 | 82 | 83 | 89 | 101 | 111 | 111 | 121 | 140 | 145 |
| Mercedes | 44 | 87 | 130 | 173 | 217 | 257 | 295 | 338 | 363 | 407 | 409 | 438 | 471 | 505 | 527 | 571 | 612 | 652 | 695 | 701 | 739 |
| Racing Point | 2 | 3 | 7 | 17 | 17 | 17 | 19 | 19 | 19 | 19 | 31 | 31 | 40 | 46 | 46 | 52 | 58 | 64 | 65 | 67 | 73 |
| Red Bull | 15 | 31 | 52 | 64 | 87 | 110 | 124 | 137 | 169 | 191 | 217 | 244 | 254 | 266 | 289 | 311 | 323 | 341 | 366 | 391 | 417 |
| Renault | 6 | 6 | 12 | 12 | 12 | 14 | 28 | 32 | 32 | 39 | 39 | 39 | 43 | 65 | 67 | 68 | 68 | 73 | 83 | 91 | 91 |
| Toro Rosso | 1 | 3 | 4 | 4 | 6 | 16 | 17 | 17 | 17 | 19 | 42 | 43 | 51 | 51 | 55 | 55 | 62 | 64 | 64 | 83 | 85 |
| Williams | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
For task 4, we use bar plots to determine which constructor has delivered maximum total points in the entire F1 season Using the ggplot, we see a lopsided results, certain constructors such as Redbull, Ferrari and Mercedes only scored about >400 whilst the other constructors did not make an effort to increase their points.
In order to see, how drivers played over the season. We use a bar graph to determine which driver has delivered maximum total points in the entire F1 season. As expected since the constructors performance was lopsided, the driver performance was lopsided too. With Charles Leclerc, Lewis Hamilton, Max Verstappen and Sebastian Vettel dominating the race scoring >= 100 points. Lewis Hamilton naturally took the season by scoring >400 points. It can also be seen that other drivers, scored <= 100 points did not make considerable effort to compete with other drivers.
Inspiration for the writeup: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjkheDC2OPsAhWOXSsKHUfEB04QFjAAegQIARAC&url=http%3A%2F%2Fsamples.leanpub.com%2Fwranglingf1datawithr-sample.pdf&usg=AOvVaw1X1mAvCv8Tn-bDP3GhS35T Referenced chapter: Point Performance Chart
We make another (animated) distribution to see how constructors accumulated over different rounds in the game. Illustrating how the constructors performed in each round using cumulative points. As the number of rounds increase we see that Mercedes, Ferrari and Redbull start to dominate the race as also visible in plot 1. The colored lines show us the performance of a constructor, with the dotted gray line telling us the name of the constructor. The legends on the right side tell us about which constructor corresponds to which color.
#1st plot
p1 <- f1_tidy %>%
group_by(constructor) %>%
summarise(points = sum(points),.groups ="drop_last") %>% #use summarise to get total points scored by each constructor
ggplot() + aes(constructor, points) + geom_col() + #constructor on x axis and points on y axis
ggtitle('Constructor wise performance')
p1
#2nd plot
p2 <- f1_tidy %>%
group_by(driver) %>%
summarise(points = sum(points),.groups ="drop_last") %>% #use summarise to get total points scored by each driver
ggplot() + aes(driver, points) + geom_col() + #driver on x axis and points on y axis
ggtitle('Driver wise performance') +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
p2
#3rd plot
f1_tidy%>%
mutate(round = factor(round, unique(round)))%>% #convert round to a factor and take only unique values
group_by(name,round)%>% #group on basis of name and round
summarise(points = sum(points),.groups ="drop_last") %>% #summarise to get total points
mutate(points = cumsum(points))%>% #use cumsum to get the cumulative sums of points according to the round
mutate(round = as.integer(round))%>% #convert the class of round to integer
select(round, points, name) -> p3 #make selections of round, points and name
ggplot(data = p3,mapping = aes(x = round, y = points, group = name))+ #define x axis as round, y axis as points and group according to the constructor name
geom_line(mapping = aes(color = name), alpha = 0.7) + #make a line plot with color set according to constructor name
geom_segment(aes(xend = 21, yend=points), linetype = 2, color = "grey")+ #create a dotted line segment
geom_point(size = 0.5,alpha = 0.5)+ #create a point on the tip of the line showing cumulative sum of constructor scores per round
geom_text(aes(x=21.1,label = name),size = 2)+ #label according to the name of constructor
gganimate::transition_reveal(round)+ #create animation with transition reveal according to the rounds in the game
labs(title = "Constructor's Standing", # create title, yaxis label and color label
y = "Constructor Points",
color = "Constructor")+
coord_cartesian(clip = "off") #zoom the plot using coord_cartesian